Multiple bioinformatics analysis identifies IGFBP1 as associated with the prognosis of stomach adenocarcinoma

This study aimed to screen the hub gene for predicting the prognosis of patients with stomach adenocarcinoma (STAD). The RNA-sequencing expression data and clinical data of STAD were collected from the cancer genome atlas. The R package “limma” was performed to ascertain the differentially expressed genes (DEGs) between the relapse group and non-relapse group, and the DEGs between the survival dead status group and survival alive status group were screened. The overlapping genes between 2 DEGs sets were identified by the Venn diagram. Many different bioinformatics analysis methods were performed to analyze the importance of hub genes. One gene signature, IGFBP1, was extracted. The KM plot indicated that STAD patients with low IGFBP1 mRNA expression have a shorter overall survival time. The top 100 co-expression genes of IGFBP1 were mainly enriched in complement and coagulation cascades, epithelial cell signaling in Helicobacter pylori infection, and Wnt signaling pathway. Immune infiltration analysis indicated IGFBP1 may inhibit immune cell infiltration in tumors by infiltration and immune escape, leading to tumor metastasis and progression. The bioinformatics analysis results indicate that IGFBP1 can be used as a tool to evaluate the mortality risk of patients with STAD.


Introduction
Based on Global Cancer Statistics 2020 which is an authoritative survey report to estimate incidence and mortality worldwide for 36 cancers in 185 countries by the GLOBOCAN, stomach adenocarcinoma (STAD) still is one of the most common malignant tumors worldwide. It is responsible for over 1 million new cases in 2020 and an estimated 769,000 deaths. The mortality rate is up to 1 in 13, ranking 4th for mortality globally after lung cancer, colorectum cancer, and liver cancer. [1] Moreover, it was a notable finding that increased prevalence of stomach cancer in young people whose age is under 50 years. [2,3] Some studies speculated that it was possible due to the abuse of antibiotics and acid suppressants. [4,5] At present, radiotherapy, chemotherapy, tumor removal, immunotherapy, and surgery are considered effective for STAD in clinical, but the cure rate remains unsatisfactory because of its recurrence and metastasis. [6] its prevalence of recurrence and metastasis exceeded 15% in groups with a combination of different clinical risk factors, [7] to the extent that reduced the survival time of STAD patients, and increased the risk of tumor malignancy. [8] Therefore, the determination of predictive biomarkers of STAD is extremely important. For instance, Zeng et al [9] research showed that LCP1 was a prognostic biomarker correlated with immune infiltrate in gastric cancer. Hu et al [10] study indicated that upregulation of PDLIM3 was significantly associated with poor prognosis in STAD. There are relatively few genetic markers based on recurrence-related. Hence, it is urgent to search for new recurrence and prognosis-related biomarkers for STAD.
In the present study, we aimed to screen for the hub genes that are prognostic and associated with recurrence based on the cancer genome atlas (TCGA)-STAD dataset by bioinformatics methods. As a result, insulin-like growth factor binding protein (IGFBP)-1 was screened. The relationship of its expression with clinical information was analyzed to assess its prognosis value. Moreover, its co-expression genes were Medicine collected from cBioPortal database, and the Kyoto encyclopedia of genes and genomes (KEGG)and gene ontology (GO) enrichment analyses were performed to research the potential function of IGFBP1. Analysis of immune cell infiltration levels was used to explore the relationship between the expression of IGFBP1 and the main 6 kinds of immune cells. All analyses indicated that it could be an independent prognostic factor for STAD.

Data source
The workflow for the present study is shown in Figure 1. The RNA-sequencing expression data and clinical data of STAD were collected from TCGA. Then, patients without overall survival and survival status were excluded, and a total of 392 samples including 357 STAD patients and 35 normal samples were analyzed. Moreover, the gene protein expression detected by immunohistochemistry (IHC) in tumor and normal tissue was explored from the human protein atlas and was verified by IHC experiment.

Differentially expressed genes identification
The STAD patients were collected from TCGA-STAD datasets. Then, those patients were grouped by relapse status and overall survival status. The R package "limma" was performed to normalize the RNA-seq expression data and ascertain the differentially expressed genes (DEGs) between the relapse group and non-relapse group with screening conditions including P < .05 and |log2FC|>1.5 in the TCGA-STAD datasets. The results were visualized by the ggplot2 R software package to draw a volcano plot, and the top 50 up-regulated and down-regulated genes were used as heat maps respectively. Analogously, the DEGs between the overall survival dead status group and overall survival alive status group were screened.

Functional enrichment analysis
The overlapping genes between 2 DEGs sets were identified by the Venn diagram. Then, their co-expression genes were collected from the TCGA database, and the top 100 co-expression genes were selected for further enrichment analysis. The "clusterProfiler" R package 15 was employed to perform KEGG analyses on the overlapping genes. P values were adjusted by the Bebjamini and Hochberg method. The (GO) function enrichment analysis also was performed, and cellular component, biological process (BP), and molecular function (MF) were annotated. These pathways were considered to be significantly enriched when the following criteria were met: nominal P value < .05, false discovery rate q value < 0.25, and absolute normalized enrichment score > 1.

IHC
IHC is a widely used diagnostic technique in tissue pathology. IHC use conventional 2-step staining. Tissues were embedded in paraffin, then sectioned and dewaxed to water. Antigen was repair by microwave heating. Section was washed with phosphate buffered saline and added primary antibody. After overnight incubation at 4°C, phosphate buffered saline was washed clean, and the second antibody was added and incubated at 37°C for 30 minutes. Then, section was colored by DAB. Hematoxylin re-staining, dehydration, transparency, sealing.

Statistical analyses
Kaplan-Meier method was performed to analyze the relationship between the expression of hub genes and the survival of patients with STAD, while the optimal cutoff point was adopted to divide patients into 2 groups, and the log-rank test was used to analyze the differences between survival curves. The receiver operating characteristic (ROC) curves were used to evaluate the prediction of hub gene expression on the survival probability by using the survival ROC package in R. Univariate and multivariable Cox regression analysis was used to assess the correlation between overall survival and hub gene expression in STAD patients with survival packages (version 3.6.4).

Analysis of immune cell infiltration levels
To research the relationship between the expression of the hub gene and 6 kinds of immune cells, we conducted the analysis of immune cell infiltration levels by the Tumor Immune Estimation Resource methods based on the TIMER2.0 database (https://cistrome.shinyapps.io/timer/) which was widely used to study immune cell infiltrates across various different tumors. [11] The related immune cells included B cell, CD8 + T cell, CD4 + T cell, macrophage, neutrophil, and dendritic cell, and the relationship was corrected according to tumor purity. Moreover, the STAD patient was grouped into 2 groups by the expression level of the hub gene to research further correlations between the hub gene and the immune microenvironment score of 6 kinds of immune cells.

Identification of overlapping genes between the 2 DEG lists
A total of 16 DEGs between the overall survival dead status group and overall survival alive status group were screened according to P < .05 and |log2FC|>1.5, including 11 up-regulated genes and 5 down-regulated genes ( Fig. 2A and B). Similarly, 346 DEGs between the overall survival dead status group and overall survival alive status group were screened according to P < .05 and |log2FC|>1.5, including 339 up-regulated genes and 7 down-regulated genes ( Fig. 2C and D). Then, 1 overlapping gene, IGFBP1, between the 2 DEGs was extracted by the Venn plot (Fig. 2E). This gene is a member of the IGFBP family and encodes a protein with an IGFBP N-terminal domain and a thyroglobulin type-I domain.

Differential expression of IGFBP1
In order to reveal the clinical value of IGFBP1, we explored further its expression in STAD. The differential expression analysis was carried out. The result showed that its mRNA had a higher expression in STAD tissues than in normal tissues with www.md-journal.com significance both in TCGA-STAD datasets ( Fig. 3A; P < .01) and paired samples ( Fig. 3B; P < .01). Figure 3C and D showed that STAD protein expression by the immunohistochemical method was different between stomach cancer tissue and normal tissue according on human protein atlas database. Moreover, the level of IGFBP1 protein in STAD tissue was different with in normal tissue, which was verified by IHC experiment and the results are shown in Figure 4.

Survival analysis
The Kaplan-Meier analysis with the log-rank test was conducted to research the relationship between IGFBP1 expression and patients overall survival. The results indicated that STAD patients with low IGFBP1 mRNA expression have shown a shorter overall survival time (Fig. 5A). According to the ROC analysis result that area under the curve of receiver operating characteristic curve was 0.55, the expression of IGFBP1 had a prediction capacity on the survival probability of patients with STAD (Fig. 5B). To further explore the relationship between the expression of IGFBP1 and the overall survival (OS), the COX regression was carried out. The results indicated that the expression group of IGFBP1 was closely related to OS in both Univariable COX regression (HR: 0.62, 95% CI: 0.38-0.99, P = .04) and multivariate COX regression (HR: 0.59, 95% CI: 0.37-0.95, P = .03) while other clinical features did not relate to OS (Table 1).

Functional enrichment analyses
Co-expression genes of IGFBP1 were downloaded from the cBio-Portal database. KEGG enrichment analyses were conducted  based on the top 100 co-expression genes to identify IGFBP1related pathways and biological functions and significant pathways were shown in Figure 6A. The result showed that those genes were mainly enriched in complement and coagulation cascades, rheumatoid arthritis, collecting duct acid secretion, vibrio cholerae infection, epithelial cell signaling in Helicobacter pylori infection, and Wnt signaling pathway (Fig. 6A). The GO covers cellular components, BP, and MFs, and is widely adopted in functional enrichment analyses. In GO enrichment analysis, cell components were mainly engaged in the extracellular region, vesicle, extracellular space, apical plasma membrane, extracellular matrix, endoplasmic reticulum lumen, collagen-containing extracellular matrix, vacuolar proton-transporting V-type ATPase complex (Fig. 6B). The BP mainly involves reactive oxygen species metabolic process, thyroid hormone generation, hydrogen peroxide biosynthetic process, thyroid hormone metabolic process, antibiotic biosynthetic process, positive regulation of amine transport, ATP hydrolysis coupled ion transmembrane transport (Fig. 6C). Moreover, MF analysis showed that these genes were related to transmembrane transporter activity, transporter activity, ion transmembrane transporter activity, inorganic molecular entity transmembrane transporter activity, cation transmembrane transporter activity, monovalent inorganic cation transmembrane transporter activity, anion transmembrane transporter activity sodium ion transmembrane transporter activity, ankyrin binding basic amino acid transmembrane transporter activity (Fig. 6D).

Discussion
In spite of various treatments, STAD remains a common malignant tumor of the digestive system with a high mortality rate. [12] Moreover, the prognosis of patients with STAD is hard to predict individually using uniform clinical criteria due to its high heterogeneity. Therefore, it is necessary to find new biomarkers to perfect the prediction system for the prognosis of patients with STAD. The recurrence of STAD affects prognosis. However, fewer studies have focused on screening STAD prognosis biomarkers based on the recurrence-related gene. Therefore, in this work, DEGs between the overall survival dead status group and overall survival alive status group were screened based on TCGA-STAD datasets. Meanwhile, DEGs between the overall survival dead status group and overall survival alive status group were screened. Then, 1 prognosis and recurrence-related gene, IGFBP1, was extracted between the 2 DEGs. IGFBP1 is a member of the IGFBP family and encodes a protein with an IGFBP N-terminal domain and a thyroglobulin type-I domain. It is rapidly metabolized in the circulation and insulin is the main regulator of its expression. The IGFBP1 family is able to regulate the activity of insulin-like growth factors (IGFs) by highly binding to them. IGFs is a multifunctional regulator of cell proliferation that promotes cell differentiation, proliferation, individual growth and development, and inhibits apoptosis. [13] Moreover, IGF-1 was at a high expression level in STAD and promotes the proliferation and invasion of tumor cells through the JAK1/STAT3 signaling pathway. [14] Recent in vivo studies have indicated that the IGFBP1 serves as a molecular switch restricting IGF signaling and diverts the limited energy resources from growth and development towards metabolic processes essential for survival under stress conditions. [15,16] In the present study, we found that the IGFBP1 was a high expression in STAD tissue and its high expression was beneficial to the prognosis of patients with STAD. Therefore, we speculate that in STAD cells, IGFBP1 inhibits the IGF signaling pathway by binding to IGFs, thereby inhibiting cell growth. However, this result is in contrast to previously reported studies that high IGFBP1 expression was associated with hematogenous metastasis and poor survival due to different research patients. Specifically, all STAD patients with or without various treatments were included in this work, while only patients with gastric cancer who underwent gastrectomy were collected in the reported study. [17] Luo et al [18] have investigated that the expression and release of IGFBP1 were increased, with enhanced expression being associated with the migration ability of cancer cells in gastric cancer infected with H pylori, and suggesting that IGFBP1 may be a tumor-suppressor gene in the process of H pylori-induced STAD. Consistently, the KEGG enrichment analysis showed that the co-expression genes of IGFBP1 were related to epithelial cell signaling in the H pylori infection in the present study. H pylori infection is strongly associated with the development of STAD because H pylori infection can initiate inflammatory pathways in the gastric mucosa, and activate signaling pathways that relate the development and process of STAD. The previously reported studies have shown that H pylori improved ROS level and induced DNA damage in gastric epithelial cells. Moreover, the infection of H pylori activated P13K/AKT signaling pathways that were ROS mediated. [19] In addition, H pylori virulence factor CagA can inhibit the proliferation of gastric cancer cells and promote the apoptosis of gastric cancer cells, and its mechanism may be related to the activation of the ERK signaling pathway by CagA. [20] Patients with the same histological types of cancer may have different clinical outcomes due to different of immune cell infiltration. [21] In this study, there was a negative association between the expression of IGFBP1 and CD8 + T cell, DC cell. In addition, the immune microenvironment score of the CD4 + T cell, CD8 + T cell, and DC was in the IGFBP1 low expression group significantly high than in the GFBP1 high expression group. These results suggest that IGFBP1 may inhibit immune cell infiltration in tumors by infiltration and immune escape, leading to tumor metastasis and progression.

Conclusion
In the present study, a gene biomarker of STAD was screened based on tumor recurrence-related and prognosis-related genes. The bioinformatics analysis results indicate that the gene signature can be used as a tool to evaluate the mortality risk of patients with STAD. However, this work remains some limitations due to the lack of experiments in vivo.